118 research outputs found
Salient Region Segmentation
Saliency prediction is a well studied problem in computer vision. Early
saliency models were based on low-level hand-crafted feature derived from
insights gained in neuroscience and psychophysics. In the wake of deep learning
breakthrough, a new cohort of models were proposed based on neural network
architectures, allowing significantly higher gaze prediction than previous
shallow models, on all metrics.
However, most models treat the saliency prediction as a \textit{regression}
problem, and accurate regression of high-dimensional data is known to be a hard
problem. Furthermore, it is unclear that intermediate levels of saliency (ie,
neither very high, nor very low) are meaningful: Something is either salient,
or it is not.
Drawing from those two observations, we reformulate the saliency prediction
problem as a salient region \textit{segmentation} problem. We demonstrate that
the reformulation allows for faster convergence than the classical regression
problem, while performance is comparable to state-of-the-art.
We also visualise the general features learned by the model, which are showed
to be consistent with insights from psychophysics
Temporal accumulation of oriented visual features
In this paper we present a framework for accumulating on-line a model of a moving object (e.g., when manipulated by a robot). The proposed scheme is based on Bayesian filtering of local features, filtering jointly position, orientation and appearance information. The work presented here is novel in two aspects: first, we use an estimation mechanism that updates iteratively not only geometrical information, but also appearance information. Second, we propose a probabilistic version of the classical n-scan criterion that allows us to select which features are preserved and which are discarded, while making use of the available uncertainty model.
The accumulated representations have been used in three different contexts: pose estimation, robotic grasping, and driver assistance scenario
Human Attention in Image Captioning: Dataset and Analysis
In this work, we present a novel dataset consisting of eye movements and
verbal descriptions recorded synchronously over images. Using this data, we
study the differences in human attention during free-viewing and image
captioning tasks. We look into the relationship between human attention and
language constructs during perception and sentence articulation. We also
analyse attention deployment mechanisms in the top-down soft attention approach
that is argued to mimic human attention in captioning tasks, and investigate
whether visual saliency can help image captioning. Our study reveals that (1)
human attention behaviour differs in free-viewing and image description tasks.
Humans tend to fixate on a greater variety of regions under the latter task,
(2) there is a strong relationship between described objects and attended
objects ( of the described objects are being attended), (3) a
convolutional neural network as feature encoder accounts for human-attended
regions during image captioning to a great extent (around ), (4)
soft-attention mechanism differs from human attention, both spatially and
temporally, and there is low correlation between caption scores and attention
consistency scores. These indicate a large gap between humans and machines in
regards to top-down attention, and (5) by integrating the soft attention model
with image saliency, we can significantly improve the model's performance on
Flickr30k and MSCOCO benchmarks. The dataset can be found at:
https://github.com/SenHe/Human-Attention-in-Image-Captioning.Comment: To appear at ICCV 201
Improving object detection performance using scene contextual constraints
Contextual information, such as the co-occurrence of objects and the spatial and relative size among objects, provides rich and complex information about digital scenes. It also plays an important role in improving object detection and determining out-of-context objects. In this work, we present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes. We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO), where in some experiments PASCAL2012 is also used.We also show that iterating the process of applying our contextual models also enhances the detection performance further
Multi-View Object Instance Recognition in an Industrial Context
We present a fast object recognition system coding shape by viewpoint invariant geometric relations and appearance information. In our advanced industrial work-cell, the system can observe the work space of the robot by three pairs of Kinect and stereo cameras allowing for reliable and complete object information. From these sensors, we derive global viewpoint invariant shape features and robust color features making use of color normalization techniques.
We show that in such a set-up, our system can achieve high performance already with a very low number of training samples, which is crucial for user acceptance and that the use of multiple views is crucial for performance. This indicates that our approach can be used in controlled but realistic industrial contexts that require—besides high reliability—fast processing and an intuitive and easy use at the end-user side.European UnionDanish Council for Strategic Researc
A framework for probabilistic weather forecast post-processing across models and lead times using machine learning
Forecasting the weather is an increasingly data intensive exercise. Numerical
Weather Prediction (NWP) models are becoming more complex, with higher
resolutions, and there are increasing numbers of different models in operation.
While the forecasting skill of NWP models continues to improve, the number and
complexity of these models poses a new challenge for the operational
meteorologist: how should the information from all available models, each with
their own unique biases and limitations, be combined in order to provide
stakeholders with well-calibrated probabilistic forecasts to use in decision
making? In this paper, we use a road surface temperature example to demonstrate
a three-stage framework that uses machine learning to bridge the gap between
sets of separate forecasts from NWP models and the 'ideal' forecast for
decision support: probabilities of future weather outcomes. First, we use
Quantile Regression Forests to learn the error profile of each numerical model,
and use these to apply empirically-derived probability distributions to
forecasts. Second, we combine these probabilistic forecasts using quantile
averaging. Third, we interpolate between the aggregate quantiles in order to
generate a full predictive distribution, which we demonstrate has properties
suitable for decision support. Our results suggest that this approach provides
an effective and operationally viable framework for the cohesive
post-processing of weather forecasts across multiple models and lead times to
produce a well-calibrated probabilistic output.Comment: 17 pages, 9 figures, to be published in Philosophical Transactions of
the Royal Society
- …